22 Jan 2019

Points from Thursday

  • Data affected by a perturbation is problematic for arima(), Arima().
  • Seasonal ARIMA has effect of Jan (or Feb …) in year t on Jan (or Feb …) in year t+1. Not typical when working with population data.
  • Removing the mean season is different than a seasonal difference.
  • Data with multiple seasons (daily, monthly, yearly) will be problematic for standard ARIMA seasonal models.

Week 3: State-Space Model

Weeks 1-3.5: building blocks for analysis of multivariate time-series data with observation error, structure, and missing values

Week 3.5+: putting this all together to start analyzing ecological data sets


  • Matrix math (multivariate)
  • Properties of time series data
  • AR and MA models: \(x(t) = b_1 x_{t-1} + b_2 x_{t-2} + e(t)\)
  • Today: State-space models (observation error and missing values)
  • Model evaluation and selection

Univariate linear state-space model

\[x_t = x_{t-1}+u+w_t, \,\,\, w_t \sim N(0,q)\] \[y_t = x_t + v_t, \,\,\, v_t \sim N(0,r)\] The \(x\) model is the classic “random walk”.

\(y\) are the observatons.

This model is a random walk observed with (Gaussian) error.

Univariate linear state-space model

\[x_t = x_{t-1}+u+w_t, \,\,\, w_t \sim N(0,q)\] \[y_t = x_t + v_t, \,\,\, v_t \sim N(0,r)\]

There are many textbooks on this class of model. It is used in extensively in economics and engineering.

AR(1) or AR lag-1

All of these are examples:

\[x_t = x_{t-1} + u + w_t\] \[x_{t+1} = x_t + w_t\] \[x_t = b x_{t-1} + u + w_t\]

Gompertz model

Addition of \(b\) with \(0<b<1\) leads to process model with mean-reversion.

In the ecological literature on density-dependent processes, you may see this in non-log notation:

\[N_t = exp(u+w_t)N^b_{t-1}\]

\(N_t\) is population size.

Gompertz model

Take the log, and we have

\[x_t = bx_{t-1}+u+w_t\] \[w_t \sim N(0,q)\] It is not required that \(w_t\) is Gaussian but that is a common assumption. Dynamics of processes with non-Gaussian errors, esp long-tailed errors, is a common extension. Autocorrelated errors could be implemented with MA process or covariates.

Gompertz model

Why is the AR-1 model so important in analysis of ecological data?

Additive random walks

\[x_t = x_{t-1}+u+w_t, \,\,\, w_t \sim N(0,q)\]

  • Movement, changes in gene frequency, somatic growth if growth is by fixed amounts

  • Why Gaussian? The average of many small perturbations, regardless of their distribution, is Gaussian.

Why is the AR-1 model so important in analysis of ecological data?

Multiplicative random walks

\[n_t = \lambda n_{t-1}e_t, \,\,\, log(e_t) \sim N(0,q)\]

  • Population growth, somatic growth if growth is by percentage

  • Take the log and you get the linear additive model above. log-normal error distribution means that 10% increase is as likely as 10% decrease

Simple model, great flexibility

An AR-1 random walk can show a wide-range of trajectories, even for the same parameter values. All trajectories below came from the same random walk model: \(x_t= x_{t-1} -0.02+w_t\), \(w_t \sim N(mean=0.0, var=0.01)\).

Simple model, great flexibility

An AR-1 random walk can show a wide-range of trajectories, even for the same parameter values.

Accurately estimating and conveying the uncertainty in the parameter estimates, state estimates, and predictions is critical.

Definition: state-space

The “state” is a hidden (dynamical) variable. In this class, it is a hidden random walk or AR(1) process.

Our data are observations of this hidden state.

Often state-space models include inputs (explanatory variables) and typically the state or the data are multivariate.

The model you are seeing today is a simple univariate state-space model with no inputs.

state: \(x_t = x_{t-1} + u + w_t\)

observation: \(y_t = x_t + v_t\)

Example: population count data

Yearly, usually, population or subpopulation counts, possibly with missing values.

Example: population count data

The data are observations of a hidden ‘true’ population size. The data are observations of that hidden state and have observation error.

Observation error

This is a survey photograph for Steller sea lions in the Gulf of Alaska. There IS some number of sea lions in our population in year \(t\), but we don’t know that number precisely. It is “hidden”.

The observation error variance is often unknowable in fisheries and ecological analyses

Sightability varies due to factors that may not be fully understood or measureable

  • Environmental factors (tides, temperature, etc.)

  • Population factors (age structure, sex ratio, etc.)

  • Species interactions (prey distribution, prey density, predator distribution or density, etc.)

Sampling variability–due to how you actually count animals–is just one component of observation variance

Process versus observaton variability

Suppose we have the following data (population density logged)

Fit a linear regression

The model of the hidden state in this case is \(x_t = \alpha + \beta t\). The observation model is \(y_t=x_t+v_t\). All variability = non-process or observation variability.

Fit an AR(1) model

The model of the hidden state in this case is \(x_t = \alpha + x_{t-1}+w_t\). The observation model is \(y_t=x_t\). All variability = process variability.

Fit a state-space model

Autoregressive state-space models fit a random walk AR(1) through the data. The variabilty in the data contains both process and non-process (observation) variability.

Non-process variability

Observation or “non-process” error is the difference between the hidden state (blue line) and the observation (X).

Process variability

Process error is the difference between the expected \(x_t\) given data up to time \(t-1\) (x in the plot) and the actual \(x\) at time \(t\).

PVA example

One use of univariate state-space models is “count-based” population viability analysis (chap 7 HWS2014)

How you model your data has a large impact on your forecasts

How can we separate process and non-process variance?

Wouldn’t these two variances be impossible to separate?

They have different temporal patterns.